vehicle routing
Reinforcement Learning with Combinatorial Actions: An Application to Vehicle Routing
Value-function-based methods have long played an important role in reinforcement learning. However, finding the best next action given a value function of arbitrary complexity is nontrivial when the action space is too large for enumeration. We develop a framework for value-function-based deep reinforcement learning with a combinatorial action space, in which the action selection problem is explicitly formulated as a mixed-integer optimization problem. As a motivating example, we present an application of this framework to the capacitated vehicle routing problem (CVRP), a combinatorial optimization problem in which a set of locations must be covered by a single vehicle with limited capacity. On each instance, we model an action as the construction of a single route, and consider a deterministic policy which is improved through a simple policy iteration algorithm. Our approach is competitive with other reinforcement learning methods and achieves an average gap of 1.7% with state-of-the-art OR methods on standard library instances of medium size.
Review for NeurIPS paper: Reinforcement Learning with Combinatorial Actions: An Application to Vehicle Routing
Summary and Contributions: - New reinforcement learning algorithm to solve capacitated vehicle routing problem. However, there are some observations for the Machine Learning community that are of some interest. There is an enduring interest in the reinforcement learning community to investigate ways in which reinforcement learning technologies can play a role in hard combinatorial optimisation settings. Here, following the cited 2018 NeurIPS publication by Nazari et al., the authors of the submitted manuscript develop and evaluate a novel reinforcement learning approach for the capacitated vehicle routing problem (CVRP). The CVRP is a hard combinatorial problem class that includes the Travelling Sales Person problem.
Review for NeurIPS paper: Reinforcement Learning with Combinatorial Actions: An Application to Vehicle Routing
The paper proposes a novel reinforcement learning approach to solving the capacitated vehicle routing problem. It involves learning a value function and solving a TSP for the prizing problem. Reviewers agree that the proposed approach is novel and interesting. One reviewer is sceptical of the work because of doubts about the performance achievable with the proposed approach. However, the ideas presented still deserve to be presented at NeurIPS, with the hope of bringing advances to this research area.
Reinforcement Learning with Combinatorial Actions: An Application to Vehicle Routing
Value-function-based methods have long played an important role in reinforcement learning. However, finding the best next action given a value function of arbitrary complexity is nontrivial when the action space is too large for enumeration. We develop a framework for value-function-based deep reinforcement learning with a combinatorial action space, in which the action selection problem is explicitly formulated as a mixed-integer optimization problem. As a motivating example, we present an application of this framework to the capacitated vehicle routing problem (CVRP), a combinatorial optimization problem in which a set of locations must be covered by a single vehicle with limited capacity. On each instance, we model an action as the construction of a single route, and consider a deterministic policy which is improved through a simple policy iteration algorithm. Our approach is competitive with other reinforcement learning methods and achieves an average gap of 1.7% with state-of-the-art OR methods on standard library instances of medium size.
Joint Optimization of Traffic Signal Control and Vehicle Routing in Signalized Road Networks using Multi-Agent Deep Reinforcement Learning
Peng, Xianyue, Gao, Hang, Han, Gengyue, Wang, Hao, Zhang, Michael
Urban traffic congestion is a critical predicament that plagues modern road networks. To alleviate this issue and enhance traffic efficiency, traffic signal control and vehicle routing have proven to be effective measures. In this paper, we propose a joint optimization approach for traffic signal control and vehicle routing in signalized road networks. The objective is to enhance network performance by simultaneously controlling signal timings and route choices using Multi-Agent Deep Reinforcement Learning (MADRL). Signal control agents (SAs) are employed to establish signal timings at intersections, whereas vehicle routing agents (RAs) are responsible for selecting vehicle routes. By establishing relevance between agents and enabling them to share observations and rewards, interaction and cooperation among agents are fostered, which enhances individual training. The Multi-Agent Advantage Actor-Critic algorithm is used to handle multi-agent environments, and Deep Neural Network (DNN) structures are designed to facilitate the algorithm's convergence. Notably, our work is the first to utilize MADRL in determining the optimal joint policy for signal control and vehicle routing. Numerical experiments conducted on the modified Sioux network demonstrate that our integration of signal control and vehicle routing outperforms controlling signal timings or vehicles' routes alone in enhancing traffic efficiency. Key words: traffic congestion; signalized road networks; vehicle routing; signal control; multi-agent deep reinforcement learning 1. Introduction Traffic signal control and vehicle routing are recognized as effective measures to alleviate traffic congestion and enhance traffic efficiency in urban road networks. The signal settings are intrinsically linked to the route decisions made by drivers. This is because the traffic control system is designed to improve network performance, which is accomplished through a comprehensive analysis of the network flow patterns that encompasses drivers' route decisions.
Unlocking Carbon Reduction Potential with Reinforcement Learning for the Three-Dimensional Loading Capacitated Vehicle Routing Problem
Schoepf, Stefan, Mak, Stephen, Senoner, Julian, Xu, Liming, Torbjörn, Netland, Brintrup, Alexandra
Heavy goods vehicles are vital backbones of the supply chain delivery system but also contribute significantly to carbon emissions with only 60% loading efficiency in the United Kingdom. Collaborative vehicle routing has been proposed as a solution to increase efficiency, but challenges remain to make this a possibility. One key challenge is the efficient computation of viable solutions for co-loading and routing. Current operations research methods suffer from non-linear scaling with increasing problem size and are therefore bound to limited geographic areas to compute results in time for day-to-day operations. This only allows for local optima in routing and leaves global optimisation potential untouched. We develop a reinforcement learning model to solve the three-dimensional loading capacitated vehicle routing problem in approximately linear time. While this problem has been studied extensively in operations research, no publications on solving it with reinforcement learning exist. We demonstrate the favourable scaling of our reinforcement learning model and benchmark our routing performance against state-of-the-art methods. The model performs within an average gap of 3.83% to 8.10% compared to established methods. Our model not only represents a promising first step towards large-scale logistics optimisation with reinforcement learning but also lays the foundation for this research stream.